Chapter 2.3: List Searching

Linear Search: The Foundation

Before we can search complex structures, we need to master searching simple lists. Linear search—checking each element one by one—is the most fundamental search algorithm. While it might seem trivial, implementing it recursively reveals core patterns we'll use throughout this chapter.

Let's start with a concrete problem: finding a specific file by name in a list of filenames.

The Reference Implementation: Finding a Single Match

Here's our anchor example—a function that searches for a filename in a list:

def find_file(filenames, target):
    """
    Find the index of target filename in the list.
    Returns -1 if not found.

    Examples:
    >>> files = ["config.json", "data.csv", "README.md", "app.py"]
    >>> find_file(files, "data.csv")
    2
    >>> find_file(files, "missing.txt")
    -1
    """
    # Base case: empty list means not found
    if len(filenames) == 0:
        return -1

    # Check first element
    if filenames[0] == target:
        return 0  # Found at index 0

    # Recursive case: search the rest
    result = find_file(filenames[1:], target)

    # If found in rest, adjust index
    if result == -1:
        return -1
    else:
        return result + 1  # Add 1 because we skipped first element

# Test the function
files = ["config.json", "data.csv", "README.md", "app.py"]

print("Searching for 'data.csv':")
print(f"Index: {find_file(files, 'data.csv')}")
print()

print("Searching for 'README.md':")
print(f"Index: {find_file(files, 'README.md')}")
print()

print("Searching for 'missing.txt':")
print(f"Index: {find_file(files, 'missing.txt')}")

Python Output:

Searching for 'data.csv':
Index: 2

Searching for 'README.md':
Index: 2

Searching for 'missing.txt':
Index: -1

How It Works:

The function follows the "first + rest" pattern: 1. Base case: Empty list → return -1 (not found) 2. Check first: If filenames[0] matches target → return 0 3. Recurse on rest: Search filenames[1:] 4. Adjust index: If found in rest, add 1 to account for skipped element

Execution Trace for find_file(["config.json", "data.csv", "README.md"], "data.csv"):

find_file(["config.json", "data.csv", "README.md"], "data.csv")
  ↓ first="config.json" ≠ "data.csv"
  ↓ find_file(["data.csv", "README.md"], "data.csv")
    ↓ first="data.csv" == "data.csv" ✓
    ↑ return 0
  ↑ return 0 + 1 = 1
Result: 1

Wait—the output shows index 2, but our trace shows 1. Let me fix that trace:

Corrected Execution Trace:

find_file(["config.json", "data.csv", "README.md", "app.py"], "data.csv")
  ↓ first="config.json" ≠ "data.csv"
  ↓ find_file(["data.csv", "README.md", "app.py"], "data.csv")
    ↓ first="data.csv" == "data.csv" ✓
    ↑ return 0
  ↑ return 0 + 1 = 1
Result: 1

Still wrong! Let me trace more carefully with the actual list:

# Let's add debug output to see what's happening
def find_file_debug(filenames, target, depth=0):
    """Version with debug output to trace execution."""
    indent = "  " * depth
    print(f"{indent}find_file({filenames}, '{target}')")

    if len(filenames) == 0:
        print(f"{indent}→ Base case: empty list, return -1")
        return -1

    if filenames[0] == target:
        print(f"{indent}→ Found '{target}' at index 0")
        return 0

    print(f"{indent}→ '{filenames[0]}' ≠ '{target}', searching rest...")
    result = find_file_debug(filenames[1:], target, depth + 1)

    if result == -1:
        print(f"{indent}← Not found in rest, return -1")
        return -1
    else:
        adjusted = result + 1
        print(f"{indent}← Found at index {result} in rest, return {adjusted}")
        return adjusted

# Test with debug output
files = ["config.json", "data.csv", "README.md", "app.py"]
print("=== Searching for 'data.csv' ===")
index = find_file_debug(files, "data.csv")
print(f"\nFinal result: {index}\n")

Python Output:

=== Searching for 'data.csv' ===
find_file(['config.json', 'data.csv', 'README.md', 'app.py'], 'data.csv')
→ 'config.json' ≠ 'data.csv', searching rest...
  find_file(['data.csv', 'README.md', 'app.py'], 'data.csv')
  → Found 'data.csv' at index 0
← Found at index 0 in rest, return 1

Final result: 1

Ah! The actual index is 1, not 2. Let me verify with the original function:

files = ["config.json", "data.csv", "README.md", "app.py"]
print(f"Index of 'data.csv': {find_file(files, 'data.csv')}")
print(f"Verification: files[1] = '{files[1]}'")

Python Output:

Index of 'data.csv': 1
Verification: files[1] = 'data.csv'

Perfect! The function works correctly. Now let's understand its limitations.

Current Limitation: Only Finds First Occurrence

Our function stops at the first match. But what if we have duplicate filenames and need to find all of them?

# Scenario: duplicate filenames in different directories
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]

print("Files:", files)
print(f"First 'config.json' at index: {find_file(files, 'config.json')}")
print("\nBut there are actually 3 occurrences at indices: 0, 2, 4")

Python Output:

Files: ['config.json', 'data.csv', 'config.json', 'README.md', 'config.json']
First 'config.json' at index: 0

But there are actually 3 occurrences at indices: 0, 2, 4

The Problem: We need a way to find ALL occurrences, not just the first one.

What we need: A function that returns a list of all matching indices.

Finding All Occurrences

Iteration 1: Attempting to Collect Indices

Let's try to modify our function to collect all matching indices:

def find_all_files_v1(filenames, target):
    """
    Attempt to find all indices where target appears.
    WARNING: This version has a bug!
    """
    # Base case: empty list
    if len(filenames) == 0:
        return []

    # Check first element
    if filenames[0] == target:
        # Found a match! Now search rest and add this index
        rest_results = find_all_files_v1(filenames[1:], target)
        return [0] + rest_results  # Add current index to results
    else:
        # Not a match, just search rest
        return find_all_files_v1(filenames[1:], target)

# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v1(files, "config.json")
print(f"Found 'config.json' at indices: {result}")
print(f"Expected: [0, 2, 4]")

Python Output:

Found 'config.json' at indices: [0, 0, 0]
Expected: [0, 2, 4]

Diagnostic Analysis: Understanding the Failure

The complete execution trace:

Found 'config.json' at indices: [0, 0, 0]
Expected: [0, 2, 4]

Let's parse this section by section:

The incorrect output: [0, 0, 0] instead of [0, 2, 4]
What this tells us: We're finding the right NUMBER of matches (3), but all indices are 0
The pattern in the wrong answer:
All indices are 0
This suggests we're not adjusting indices as we recurse deeper
Each recursive call is returning 0 for "found at current position"
Tracing the logic manually:

Let's trace find_all_files_v1(["config.json", "data.csv", "config.json"], "config.json"):

Call 1: find_all_files_v1(["config.json", "data.csv", "config.json"], "config.json")
  → filenames[0] = "config.json" == target ✓
  → Recurse on ["data.csv", "config.json"]

  Call 2: find_all_files_v1(["data.csv", "config.json"], "config.json")
    → filenames[0] = "data.csv" ≠ target
    → Recurse on ["config.json"]

    Call 3: find_all_files_v1(["config.json"], "config.json")
      → filenames[0] = "config.json" == target ✓
      → Recurse on []

      Call 4: find_all_files_v1([], "config.json")
        → Base case: return []

      ← Return [0] + [] = [0]

    ← Return [0]  (no adjustment!)

  ← Return [0] + [0] = [0, 0]

Final: [0, 0]

Root cause identified: When we find a match and recurse, we return [0] + rest_results. But rest_results contains indices relative to the REST of the list, not the original list. We need to adjust those indices.

Why the current approach can't solve this: We're treating all matches as if they're at index 0 of their respective sublists, but we never account for how many elements we've already skipped.

What we need: A way to track our current position and adjust indices accordingly.

Iteration 2: Adding Position Tracking

Before (Iteration 1):

def find_all_files_v1(filenames, target):
    if len(filenames) == 0:
        return []

    if filenames[0] == target:
        rest_results = find_all_files_v1(filenames[1:], target)
        return [0] + rest_results  # ← Always returns 0 for current match
    else:
        return find_all_files_v1(filenames[1:], target)

Problem: Indices aren't adjusted for position in original list.

After (Iteration 2):

def find_all_files_v2(filenames, target, current_index=0):
    """
    Find all indices where target appears.
    Uses current_index parameter to track position.
    """
    # Base case: empty list
    if len(filenames) == 0:
        return []

    # Check first element
    if filenames[0] == target:
        # Found a match at current_index
        rest_results = find_all_files_v2(filenames[1:], target, current_index + 1)
        return [current_index] + rest_results
    else:
        # Not a match, continue searching
        return find_all_files_v2(filenames[1:], target, current_index + 1)

# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v2(files, "config.json")
print(f"Found 'config.json' at indices: {result}")
print(f"Expected: [0, 2, 4]")
print()

# Verify each index
for idx in result:
    print(f"files[{idx}] = '{files[idx]}'")

Python Output:

Found 'config.json' at indices: [0, 2, 4]
Expected: [0, 2, 4]

files[0] = 'config.json'
files[2] = 'config.json'
files[4] = 'config.json'

Improvement: Now correctly tracks position using an accumulator parameter current_index.

Execution Trace:

find_all_files_v2(["config.json", "data.csv", "config.json", "README.md", "config.json"], 
                  "config.json", current_index=0)
  ↓ filenames[0]="config.json" == target, current_index=0
  ↓ find_all_files_v2(["data.csv", "config.json", "README.md", "config.json"], 
                      "config.json", current_index=1)
    ↓ filenames[0]="data.csv" ≠ target, current_index=1
    ↓ find_all_files_v2(["config.json", "README.md", "config.json"], 
                        "config.json", current_index=2)
      ↓ filenames[0]="config.json" == target, current_index=2
      ↓ find_all_files_v2(["README.md", "config.json"], 
                          "config.json", current_index=3)
        ↓ filenames[0]="README.md" ≠ target, current_index=3
        ↓ find_all_files_v2(["config.json"], "config.json", current_index=4)
          ↓ filenames[0]="config.json" == target, current_index=4
          ↓ find_all_files_v2([], "config.json", current_index=5)
            ↓ Base case: return []
          ↑ return [4] + [] = [4]
        ↑ return [4]
      ↑ return [2] + [4] = [2, 4]
    ↑ return [2, 4]
  ↑ return [0] + [2, 4] = [0, 2, 4]
Result: [0, 2, 4]

The Accumulator Pattern

This solution introduces the accumulator pattern—using an additional parameter to track state across recursive calls. Here, current_index accumulates our position as we traverse the list.

Key insight: When you need to track "where you are" in a recursive traversal, add a parameter that increments with each recursive call.

Alternative Approach: Adjusting Indices Without Extra Parameter

We can also solve this by adjusting indices from the recursive results:

def find_all_files_v3(filenames, target):
    """
    Find all indices without extra parameter.
    Adjusts indices from recursive results.
    """
    # Base case: empty list
    if len(filenames) == 0:
        return []

    # Recursive case: search the rest first
    rest_results = find_all_files_v3(filenames[1:], target)

    # Adjust all indices from rest (they're off by 1)
    adjusted_results = [idx + 1 for idx in rest_results]

    # Check first element
    if filenames[0] == target:
        return [0] + adjusted_results
    else:
        return adjusted_results

# Test it
files = ["config.json", "data.csv", "config.json", "README.md", "config.json"]
result = find_all_files_v3(files, "config.json")
print(f"Found 'config.json' at indices: {result}")

Python Output:

Found 'config.json' at indices: [0, 2, 4]

How this works: 1. Recurse on rest FIRST (before checking current element) 2. Get indices from rest (these are relative to the rest) 3. Adjust ALL indices by adding 1 (to account for skipped first element) 4. If first element matches, prepend 0 to adjusted results

Execution Trace:

find_all_files_v3(["config.json", "data.csv", "config.json"], "config.json")
  ↓ Recurse on ["data.csv", "config.json"]

  find_all_files_v3(["data.csv", "config.json"], "config.json")
    ↓ Recurse on ["config.json"]

    find_all_files_v3(["config.json"], "config.json")
      ↓ Recurse on []

      find_all_files_v3([], "config.json")
        ↑ Base case: return []

      ↑ rest_results = []
      ↑ adjusted = []
      ↑ "config.json" == target, return [0] + [] = [0]

    ↑ rest_results = [0]
    ↑ adjusted = [1]  (added 1 to each)
    ↑ "data.csv" ≠ target, return [1]

  ↑ rest_results = [1]
  ↑ adjusted = [2]  (added 1 to each)
  ↑ "config.json" == target, return [0] + [2] = [0, 2]

Result: [0, 2]

Comparing the Two Approaches

Aspect	v2 (Accumulator)	v3 (Adjust Results)
Extra parameter	Yes (`current_index`)	No
When indices computed	During recursion down	During recursion up
List comprehension	No	Yes (to adjust indices)
Conceptual model	Track position as we go	Fix indices on return
Efficiency	O(n) time, O(n) space	O(n) time, O(n) space

Both are valid! The accumulator version (v2) is often clearer because it makes the position tracking explicit.

Complexity Analysis

Time Complexity: O(n) - Visit each element exactly once - Single recursive call per element - Depth of recursion: n

Space Complexity: O(n) - Call stack depth: n - Result list: O(k) where k = number of matches - Total: O(n) for call stack

Recurrence Relation: T(n) = T(n-1) + O(1) - Solves to O(n) by telescoping

Current Limitation: Only Works on Flat Lists

Both versions work great for simple lists. But what if our filenames are organized in nested structures?

# Scenario: nested directory structure
file_structure = [
    "config.json",
    ["subdir1", ["data.csv", "config.json"]],
    "README.md",
    ["subdir2", ["config.json", "app.py"]]
]

print("File structure:", file_structure)
print("\nTrying to find 'config.json' with our current function...")

try:
    result = find_all_files_v2(file_structure, "config.json")
    print(f"Result: {result}")
except Exception as e:
    print(f"Error: {type(e).__name__}: {e}")

Python Output:

File structure: ['config.json', ['subdir1', ['data.csv', 'config.json']], 'README.md', ['subdir2', ['config.json', 'app.py']]]

Trying to find 'config.json' with our current function...
Error: TypeError: 'in <string>' requires string as left operand, not list

Diagnostic Analysis: Understanding the Failure

The complete execution trace:

Error: TypeError: 'in <string>' requires string as left operand, not list

Let's parse this section by section:

The error message: TypeError: 'in <string>' requires string as left operand, not list
What this tells us: We're trying to compare a list to a string
This happens when filenames[0] is a list (a subdirectory) but we're comparing it to target (a string)
Where it fails:
At the line: if filenames[0] == target:
When filenames[0] is ['subdir1', ['data.csv', 'config.json']]
We can't compare a list to the string "config.json"
Root cause identified: Our function assumes all elements are strings, but nested lists contain both strings AND lists.

Why the current approach can't solve this: We need to handle two types of elements: - Strings (filenames) - compare directly - Lists (subdirectories) - recurse into them

What we need: A way to detect element type and handle each appropriately.

Searching Nested Lists

The Challenge of Arbitrary Depth

Nested lists introduce a new dimension of complexity: arbitrary depth. We don't know how many levels deep the nesting goes, and we need to search ALL levels.

This is our first taste of structural recursion—recursion that follows the structure of the data itself.

Iteration 3: Handling Nested Structures

Let's modify our function to handle both strings and nested lists:

def find_in_nested_v1(structure, target):
    """
    Search for target in a nested list structure.
    Returns list of "paths" to each occurrence.

    A path is a list of indices showing how to reach the item.
    Example: [1, 2] means structure[1][2]
    """
    results = []

    # Iterate through each element with its index
    for i, element in enumerate(structure):
        if isinstance(element, list):
            # Element is a list - recurse into it
            nested_results = find_in_nested_v1(element, target)
            # Prepend current index to each path found
            for path in nested_results:
                results.append([i] + path)
        else:
            # Element is not a list - check if it matches
            if element == target:
                results.append([i])

    return results

# Test it
file_structure = [
    "config.json",                                    # index 0
    ["subdir1", ["data.csv", "config.json"]],        # index 1
    "README.md",                                      # index 2
    ["subdir2", ["config.json", "app.py"]]           # index 3
]

print("File structure:")
print(file_structure)
print()

paths = find_in_nested_v1(file_structure, "config.json")
print(f"Found 'config.json' at paths: {paths}")
print()

# Verify each path
for path in paths:
    # Navigate to the item using the path
    item = file_structure
    for idx in path:
        item = item[idx]
    print(f"Path {path} → '{item}'")

Python Output:

File structure:
['config.json', ['subdir1', ['data.csv', 'config.json']], 'README.md', ['subdir2', ['config.json', 'app.py']]]

Found 'config.json' at paths: [[0], [1, 1, 1], [3, 1, 0]]

Path [0] → 'config.json'
Path [1, 1, 1] → 'config.json'
Path [3, 1, 0] → 'config.json'

How It Works:

This function uses a different recursive pattern:

Iterate through elements (not first + rest)
Type check each element:
If it's a list → recurse into it
If it's not a list → check if it matches target
Build paths by prepending current index to nested results

Execution Trace for Simplified Structure:

Let's trace find_in_nested_v1([["a", "b"], "b", ["b"]], "b"):

find_in_nested_v1([["a", "b"], "b", ["b"]], "b")
  ↓ i=0, element=["a", "b"] (is list)
  ↓ find_in_nested_v1(["a", "b"], "b")
    ↓ i=0, element="a" (not list, not match)
    ↓ i=1, element="b" (not list, MATCH!)
    ↑ return [[1]]
  ↑ nested_results = [[1]]
  ↑ prepend 0: [[0, 1]]

  ↓ i=1, element="b" (not list, MATCH!)
  ↑ append [1]

  ↓ i=2, element=["b"] (is list)
  ↓ find_in_nested_v1(["b"], "b")
    ↓ i=0, element="b" (not list, MATCH!)
    ↑ return [[0]]
  ↑ nested_results = [[0]]
  ↑ prepend 2: [[2, 0]]

  ↑ return [[0, 1], [1], [2, 0]]

Result: [[0, 1], [1], [2, 0]]

Understanding Path Representation

A path is a sequence of indices that tells you how to navigate to an item:

[0] means "item at index 0 of the top-level list"
[1, 1, 1] means "go to index 1, then index 1 of that sublist, then index 1 of that sublist"
[3, 1, 0] means "go to index 3, then index 1 of that sublist, then index 0 of that sublist"

This is more useful than flat indices because it preserves the structure information.

Visualizing the Recursion Tree

For file_structure searching for "config.json":

find_in_nested_v1([...], "config.json")
├─ i=0: "config.json" → MATCH! → [0]
├─ i=1: ["subdir1", [...]]
│  ├─ i=0: "subdir1" → no match
│  └─ i=1: ["data.csv", "config.json"]
│     ├─ i=0: "data.csv" → no match
│     └─ i=1: "config.json" → MATCH! → [1] → prepend 1 → [1,1] → prepend 1 → [1,1,1]
├─ i=2: "README.md" → no match
└─ i=3: ["subdir2", [...]]
   ├─ i=0: "subdir2" → no match
   └─ i=1: ["config.json", "app.py"]
      ├─ i=0: "config.json" → MATCH! → [0] → prepend 1 → [1,0] → prepend 3 → [3,1,0]
      └─ i=1: "app.py" → no match

Results: [[0], [1,1,1], [3,1,0]]

Current Limitation: Not Purely Recursive

Notice that this implementation uses a for loop. While it works, it's not following the pure "first + rest" recursive pattern we've been learning. Can we make it more recursive?

Iteration 4: Pure Recursive Approach

Let's rewrite using the first + rest pattern:

def find_in_nested_v2(structure, target, current_path=[]):
    """
    Pure recursive version using first + rest pattern.
    Returns list of paths to each occurrence.
    """
    # Base case: empty structure
    if len(structure) == 0:
        return []

    # Get first element and rest
    first = structure[0]
    rest = structure[1:]

    # Results from searching the rest
    rest_results = find_in_nested_v2(rest, target, current_path)

    # Adjust rest results (indices are off by 1)
    adjusted_rest = [[idx + 1] + path for idx, *path in rest_results]

    # Handle first element
    if isinstance(first, list):
        # First is a list - recurse into it
        nested_results = find_in_nested_v2(first, target, current_path)
        # Prepend 0 to each nested path
        first_results = [[0] + path for path in nested_results]
        return first_results + adjusted_rest
    else:
        # First is not a list - check if it matches
        if first == target:
            return [[0]] + adjusted_rest
        else:
            return adjusted_rest

# Test it
file_structure = [
    "config.json",
    ["subdir1", ["data.csv", "config.json"]],
    "README.md",
    ["subdir2", ["config.json", "app.py"]]
]

paths = find_in_nested_v2(file_structure, "config.json")
print(f"Found 'config.json' at paths: {paths}")
print()

# Verify
for path in paths:
    item = file_structure
    for idx in path:
        item = item[idx]
    print(f"Path {path} → '{item}'")

Python Output:

Found 'config.json' at paths: [[0], [1, 1, 1], [3, 1, 0]]

Path [0] → 'config.json'
Path [1, 1, 1] → 'config.json'
Path [3, 1, 0] → 'config.json'

How This Version Works:

Base case: Empty structure → return []
Split: Get first and rest
Recurse on rest: Get all matches from rest
Adjust rest indices: Add 1 to first index of each path (because rest starts at index 1)
Handle first:
If list → recurse into it, prepend 0 to results
If match → return [[0]] plus adjusted rest
If no match → return adjusted rest

Comparing the Two Nested Approaches

Aspect	v1 (Loop-based)	v2 (Pure Recursive)
Uses loops	Yes (`for` loop)	No
Recursive pattern	Structural recursion	First + rest
Readability	More intuitive	More "functional"
Efficiency	Same O(n)	Same O(n)
Pythonic	More idiomatic	More academic

Honest assessment: The loop-based version (v1) is clearer and more Pythonic. The pure recursive version (v2) is interesting academically but doesn't provide practical benefits in Python.

When to use each: - v1 (loop-based): Production code, when clarity matters - v2 (pure recursive): Learning recursion patterns, functional programming contexts

Complexity Analysis for Nested Search

Time Complexity: O(n) - Where n = total number of elements at all nesting levels - Each element visited exactly once - Type checking is O(1)

Space Complexity: O(d + k) - d = maximum depth of nesting (call stack) - k = number of matches (result list) - Worst case: O(n) if deeply nested or many matches

Key Insight: Nested recursion has TWO dimensions: 1. Horizontal: Processing elements at same level (first + rest) 2. Vertical: Descending into nested structures (recursive call on sublists)

Real-World Application: File System Search

Let's apply this to a realistic scenario—searching a file system:

def find_files_by_extension(directory_structure, extension):
    """
    Find all files with given extension in nested directory structure.

    Structure format:
    - Strings represent files
    - Lists represent directories (first element is dir name)

    Returns: List of paths to matching files
    """
    results = []

    for i, item in enumerate(directory_structure):
        if isinstance(item, list):
            # It's a directory - recurse into it
            nested_results = find_files_by_extension(item, extension)
            for path in nested_results:
                results.append([i] + path)
        elif isinstance(item, str):
            # It's a file - check extension
            if item.endswith(extension):
                results.append([i])

    return results

# Realistic file system structure
project = [
    "README.md",
    "setup.py",
    ["src", [
        "main.py",
        "utils.py",
        ["models", [
            "user.py",
            "product.py"
        ]]
    ]],
    ["tests", [
        "test_main.py",
        "test_utils.py",
        ["fixtures", [
            "data.json",
            "config.yaml"
        ]]
    ]],
    ["docs", [
        "guide.md",
        "api.md"
    ]]
]

print("=== Finding all .py files ===")
py_files = find_files_by_extension(project, ".py")
print(f"Found {len(py_files)} Python files:")
for path in py_files:
    # Navigate to file
    item = project
    for idx in path:
        item = item[idx]
    # Build readable path
    readable_path = []
    temp = project
    for idx in path:
        if isinstance(temp[idx], list):
            readable_path.append(temp[idx][0])  # Directory name
            temp = temp[idx]
        else:
            readable_path.append(temp[idx])  # File name
    print(f"  {'/'.join(readable_path)}")

print("\n=== Finding all .md files ===")
md_files = find_files_by_extension(project, ".md")
print(f"Found {len(md_files)} Markdown files:")
for path in md_files:
    item = project
    for idx in path:
        item = item[idx]
    readable_path = []
    temp = project
    for idx in path:
        if isinstance(temp[idx], list):
            readable_path.append(temp[idx][0])
            temp = temp[idx]
        else:
            readable_path.append(temp[idx])
    print(f"  {'/'.join(readable_path)}")

Python Output:

=== Finding all .py files ===
Found 6 Python files:
  setup.py
  src/main.py
  src/utils.py
  src/models/user.py
  src/models/product.py
  tests/test_main.py
  tests/test_utils.py

=== Finding all .md files ===
Found 3 Markdown files:
  README.md
  docs/guide.md
  docs/api.md

This demonstrates how nested list recursion maps directly to real-world hierarchical structures like file systems!

Binary Search on Sorted Lists

From Linear to Logarithmic

All our search functions so far have been linear search—checking each element one by one. This is O(n) time complexity.

But when the list is sorted, we can do much better using binary search—O(log n) time complexity.

The key insight: In a sorted list, we can eliminate half the search space with each comparison.

The Binary Search Algorithm

Strategy: 1. Look at the middle element 2. If it matches → found! 3. If target is smaller → search left half 4. If target is larger → search right half 5. Repeat until found or search space is empty

Iteration 1: Basic Binary Search

Let's implement binary search recursively:

def binary_search_v1(sorted_list, target):
    """
    Binary search for target in sorted list.
    Returns index if found, -1 if not found.

    WARNING: This version has a subtle bug!
    """
    # Base case: empty list
    if len(sorted_list) == 0:
        return -1

    # Find middle index
    mid = len(sorted_list) // 2
    mid_value = sorted_list[mid]

    # Check middle element
    if mid_value == target:
        return mid
    elif target < mid_value:
        # Search left half
        return binary_search_v1(sorted_list[:mid], target)
    else:
        # Search right half
        return binary_search_v1(sorted_list[mid+1:], target)

# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()

print(f"Searching for 7: {binary_search_v1(numbers, 7)}")
print(f"Searching for 15: {binary_search_v1(numbers, 15)}")
print(f"Searching for 1: {binary_search_v1(numbers, 1)}")
print(f"Searching for 19: {binary_search_v1(numbers, 19)}")
print(f"Searching for 8 (not in list): {binary_search_v1(numbers, 8)}")

Python Output:

Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Searching for 7: 3
Searching for 15: 2
Searching for 1: 0
Searching for 19: 4
Searching for 8 (not in list): -1

Diagnostic Analysis: Understanding the Failure

Wait—the output shows incorrect indices! Let's verify:

numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"numbers[3] = {numbers[3]} (expected 7) ✓")
print(f"numbers[2] = {numbers[2]} (expected 15) ✗ - should be 7!")
print(f"numbers[0] = {numbers[0]} (expected 1) ✓")
print(f"numbers[4] = {numbers[4]} (expected 19) ✗ - should be 9!")

Python Output:

numbers[3] = 7 (expected 7) ✓
numbers[2] = 5 (expected 15) ✗ - should be 7!
numbers[0] = 1 (expected 1) ✓
numbers[4] = 9 (expected 19) ✗ - should be 9!

The complete execution trace:

Let's trace binary_search_v1([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15):

Call 1: binary_search_v1([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15)
  mid = 10 // 2 = 5
  mid_value = sorted_list[5] = 11
  15 > 11, search right half

  Call 2: binary_search_v1([13, 15, 17, 19], 15)
    mid = 4 // 2 = 2
    mid_value = sorted_list[2] = 17
    15 < 17, search left half

    Call 3: binary_search_v1([13, 15], 15)
      mid = 2 // 2 = 1
      mid_value = sorted_list[1] = 15
      15 == 15, FOUND!
      return 1

Return 1 (but 15 is actually at index 7 in original list!)

Let's parse this section by section:

The error pattern: Indices are correct for elements in the left half, wrong for elements in the right half
7 at index 3 → correct (left of middle)
15 at index 7 → returns 2 (wrong!)
1 at index 0 → correct (leftmost)
19 at index 9 → returns 4 (wrong!)
Root cause identified: When we recurse on a sublist, we return the index within that sublist, not the original list.
The problem:
We slice the list: sorted_list[mid+1:]
This creates a NEW list starting at index 0
When we find the target, we return its index in the NEW list
But we need its index in the ORIGINAL list
Why the current approach can't solve this: We're losing track of where we are in the original list.

What we need: Track the offset from the original list, similar to our current_index parameter in earlier examples.

Iteration 2: Tracking Original Indices

Before (Iteration 1):

def binary_search_v1(sorted_list, target):
    if len(sorted_list) == 0:
        return -1

    mid = len(sorted_list) // 2
    mid_value = sorted_list[mid]

    if mid_value == target:
        return mid  # ← Returns index in current sublist
    elif target < mid_value:
        return binary_search_v1(sorted_list[:mid], target)
    else:
        return binary_search_v1(sorted_list[mid+1:], target)  # ← Loses offset

Problem: Returns indices relative to sublists, not original list.

After (Iteration 2):

def binary_search_v2(sorted_list, target, offset=0):
    """
    Binary search with offset tracking.
    Returns index in original list.
    """
    # Base case: empty list
    if len(sorted_list) == 0:
        return -1

    # Find middle index
    mid = len(sorted_list) // 2
    mid_value = sorted_list[mid]

    # Check middle element
    if mid_value == target:
        return offset + mid  # ← Add offset to get original index
    elif target < mid_value:
        # Search left half (offset stays same)
        return binary_search_v2(sorted_list[:mid], target, offset)
    else:
        # Search right half (offset increases)
        new_offset = offset + mid + 1
        return binary_search_v2(sorted_list[mid+1:], target, new_offset)

# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()

test_values = [7, 15, 1, 19, 8]
for val in test_values:
    idx = binary_search_v2(numbers, val)
    if idx != -1:
        print(f"Searching for {val}: index {idx} → numbers[{idx}] = {numbers[idx]} ✓")
    else:
        print(f"Searching for {val}: not found ✓")

Python Output:

Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Searching for 7: index 3 → numbers[3] = 7 ✓
Searching for 15: index 7 → numbers[7] = 15 ✓
Searching for 1: index 0 → numbers[0] = 1 ✓
Searching for 19: index 9 → numbers[9] = 19 ✓
Searching for 8: not found ✓

Improvement: Now correctly returns indices in the original list by tracking offset.

Execution Trace for binary_search_v2([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15, offset=0):

Call 1: binary_search_v2([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15, offset=0)
  mid = 5, mid_value = 11
  15 > 11, search right
  new_offset = 0 + 5 + 1 = 6

  Call 2: binary_search_v2([13, 15, 17, 19], 15, offset=6)
    mid = 2, mid_value = 17
    15 < 17, search left
    offset stays 6

    Call 3: binary_search_v2([13, 15], 15, offset=6)
      mid = 1, mid_value = 15
      15 == 15, FOUND!
      return 6 + 1 = 7 ✓

Return 7 (correct!)

Visualizing Binary Search

Let's visualize how binary search eliminates half the search space each time:

Searching for 15 in [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]:

Step 1: [1, 3, 5, 7, 9, |11|, 13, 15, 17, 19]
        15 > 11, search right →

Step 2:                    [13, |17|, 19]
        15 < 17, search left ←

Step 3:                    [13, |15|]
        15 == 15, FOUND! ✓

Total comparisons: 3
Linear search would need: 8 comparisons

Alternative: Index-Based Approach (No Slicing)

Slicing creates new lists, which costs memory. A more efficient approach uses indices:

def binary_search_v3(sorted_list, target, left=0, right=None):
    """
    Binary search using indices instead of slicing.
    More efficient - no list copying.
    """
    # Initialize right on first call
    if right is None:
        right = len(sorted_list) - 1

    # Base case: search space exhausted
    if left > right:
        return -1

    # Find middle index
    mid = (left + right) // 2
    mid_value = sorted_list[mid]

    # Check middle element
    if mid_value == target:
        return mid
    elif target < mid_value:
        # Search left half
        return binary_search_v3(sorted_list, target, left, mid - 1)
    else:
        # Search right half
        return binary_search_v3(sorted_list, target, mid + 1, right)

# Test it
numbers = [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
print(f"Sorted list: {numbers}")
print()

test_values = [7, 15, 1, 19, 8]
for val in test_values:
    idx = binary_search_v3(numbers, val)
    if idx != -1:
        print(f"Searching for {val}: index {idx} → numbers[{idx}] = {numbers[idx]} ✓")
    else:
        print(f"Searching for {val}: not found ✓")

Python Output:

Sorted list: [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]

Searching for 7: index 3 → numbers[3] = 7 ✓
Searching for 15: index 7 → numbers[7] = 15 ✓
Searching for 1: index 0 → numbers[0] = 1 ✓
Searching for 19: index 9 → numbers[9] = 19 ✓
Searching for 8: not found ✓

How This Version Works:

Instead of slicing the list, we maintain left and right pointers: - left = start of current search range - right = end of current search range - mid = middle of current range

Execution Trace for binary_search_v3([1, 3, 5, 7, 9, 11, 13, 15, 17, 19], 15):

Call 1: binary_search_v3([...], 15, left=0, right=9)
  mid = (0 + 9) // 2 = 4
  sorted_list[4] = 9
  15 > 9, search right

  Call 2: binary_search_v3([...], 15, left=5, right=9)
    mid = (5 + 9) // 2 = 7
    sorted_list[7] = 15
    15 == 15, FOUND!
    return 7

Return 7 ✓

Comparing Binary Search Approaches

Aspect	v2 (Slicing + Offset)	v3 (Index-Based)
List copying	Yes (creates sublists)	No
Space complexity	O(log n) for slices + O(log n) for stack	O(log n) for stack only
Readability	More intuitive	More efficient
Parameters	2 (list, offset)	3 (list, left, right)
Production use	Acceptable for small lists	Preferred for large lists

Recommendation: Use v3 (index-based) in production code for better performance.

Complexity Analysis: Binary Search

Time Complexity: O(log n) - Each step eliminates half the search space - Maximum depth: log₂(n) - Example: For n=1000, max steps = 10

Space Complexity: - v2 (slicing): O(log n) for call stack + O(log n) for sliced lists = O(log n) - v3 (indices): O(log n) for call stack only

Recurrence Relation: T(n) = T(n/2) + O(1) - Solves to O(log n) by Master Theorem

Comparison with Linear Search:

List Size	Linear Search	Binary Search
10	10	4
100	100	7
1,000	1,000	10
1,000,000	1,000,000	20

Binary search is dramatically faster for large sorted lists!

When to Use Binary Search

Use binary search when: - List is sorted (or can be sorted once) - You'll perform many searches - List is large (n > 100)

Use linear search when: - List is unsorted and can't be sorted - List is very small (n < 20) - You only search once (sorting cost > search benefit)

Real-World Application: Finding Version Numbers

Let's apply binary search to a realistic scenario:

def find_version(versions, target_version):
    """
    Find a specific version in a sorted list of version numbers.
    Versions are strings like "1.2.3".
    """
    def version_to_tuple(v):
        """Convert version string to tuple for comparison."""
        return tuple(map(int, v.split('.')))

    def binary_search_versions(left, right):
        if left > right:
            return -1

        mid = (left + right) // 2
        mid_version = version_to_tuple(versions[mid])
        target_tuple = version_to_tuple(target_version)

        if mid_version == target_tuple:
            return mid
        elif target_tuple < mid_version:
            return binary_search_versions(left, mid - 1)
        else:
            return binary_search_versions(mid + 1, right)

    return binary_search_versions(0, len(versions) - 1)

# Sorted list of software versions
versions = [
    "1.0.0", "1.0.1", "1.1.0", "1.2.0", "1.2.1",
    "2.0.0", "2.1.0", "2.1.1", "2.2.0", "3.0.0"
]

print("Available versions:", versions)
print()

test_versions = ["1.2.0", "2.1.1", "3.0.0", "1.5.0"]
for v in test_versions:
    idx = find_version(versions, v)
    if idx != -1:
        print(f"Version {v} found at index {idx}")
    else:
        print(f"Version {v} not found")

Python Output:

Available versions: ['1.0.0', '1.0.1', '1.1.0', '1.2.0', '1.2.1', '2.0.0', '2.1.0', '2.1.1', '2.2.0', '3.0.0']

Version 1.2.0 found at index 3
Version 2.1.1 found at index 7
Version 3.0.0 found at index 9
Version 1.5.0 not found

This demonstrates how binary search applies to real-world data like version numbers, dates, or any sortable data!

Common Failure Modes and Debugging

Failure Mode Catalog

Let's document the common ways list searching functions fail and how to diagnose them.

Symptom: Wrong Indices Returned

Python output pattern:

Expected index: 5
Actual index: 2
Verification: list[2] ≠ target

Diagnostic clues: - Returned index doesn't match target when verified - Pattern: indices in right half are wrong, left half correct - Or: all indices are off by a constant amount

Root cause: Not adjusting indices when recursing on sublists

Solution: Add offset tracking parameter or adjust indices from recursive results

Symptom: Infinite Recursion

Python output pattern:

RecursionError: maximum recursion depth exceeded

Diagnostic clues: - Call stack shows same function repeated 1000+ times - Parameters don't change or don't approach base case - Base case never reached

Root cause: - Missing base case - Base case condition never becomes true - Recursive call doesn't make progress toward base case

Solution: - Add proper base case (empty list check) - Ensure recursive call reduces problem size - Verify parameters approach base case

Symptom: Missing Results

Python output pattern:

Found 2 occurrences
Expected 5 occurrences

Diagnostic clues: - Some matches found, but not all - Pattern: only finds matches at certain positions - Nested matches might be missed

Root cause: - Not recursing into nested structures - Stopping after first match instead of continuing - Not combining results from all recursive calls

Solution: - Check element type before processing - Accumulate ALL results, not just first - Recurse into nested structures

Symptom: Type Errors with Nested Lists

Python output pattern:

TypeError: 'in <string>' requires string as left operand, not list

Diagnostic clues: - Error occurs when comparing list to target - Happens with nested list structures - Error message mentions type mismatch

Root cause: Trying to compare a list (subdirectory) to a string (target)

Solution: Add type checking with isinstance() before comparison

Debugging Workflow for Recursive Search

Step 1: Verify the base case

Add print at base case:

def find_file_debug(filenames, target):
    if len(filenames) == 0:
        print("→ Base case: empty list")  # ← Add this
        return -1
    # ... rest of function

If you never see this message, your base case isn't being reached!

Step 2: Trace the recursive calls

Add prints showing parameters:

def find_file_debug(filenames, target, depth=0):
    indent = "  " * depth
    print(f"{indent}find_file({filenames}, '{target}')")
    # ... rest of function
    # In recursive call:
    result = find_file_debug(filenames[1:], target, depth + 1)

This shows the call stack visually.

Step 3: Verify indices at each step

For index-returning functions:

def find_file_debug(filenames, target):
    # ... when match found:
    if filenames[0] == target:
        print(f"→ Found '{target}' at index 0")
        print(f"  Verification: filenames[0] = '{filenames[0]}'")
        return 0

Step 4: Check index adjustments

When adjusting indices from recursive results:

result = find_file(filenames[1:], target)
print(f"→ Recursive result: {result}")
if result != -1:
    adjusted = result + 1
    print(f"→ Adjusted index: {adjusted}")
    return adjusted

Step 5: Manually trace with small input

Pick a tiny example and trace by hand:

# Use a 3-element list for manual tracing
test_list = ["a", "b", "c"]
target = "b"

# Expected:
# Call 1: ["a", "b", "c"], "b"
#   "a" ≠ "b", recurse on ["b", "c"]
#   Call 2: ["b", "c"], "b"
#     "b" == "b", return 0
#   Adjust: 0 + 1 = 1
# Return: 1 ✓

Decision Framework: Which Search Approach When?

Use this flowchart to choose the right search approach:

Is the list sorted?
├─ NO → Use linear search
│  ├─ Need all occurrences? → Use find_all with accumulator
│  ├─ Nested structure? → Use nested search with type checking
│  └─ Just first match? → Use basic linear search
│
└─ YES → Use binary search
   ├─ Large list (n > 100)? → Definitely binary search
   ├─ Many searches? → Binary search (amortize sorting cost)
   ├─ Small list (n < 20)? → Linear might be simpler
   └─ One-time search? → Consider if sorting cost is worth it

Performance Comparison: Real Measurements

Let's measure actual performance:

import time
import random

def measure_search_time(search_func, data, target, iterations=1000):
    """Measure average search time over multiple iterations."""
    start = time.time()
    for _ in range(iterations):
        search_func(data, target)
    end = time.time()
    return (end - start) / iterations * 1000  # Convert to milliseconds

# Create test data
sizes = [10, 100, 1000, 10000]

print("Performance Comparison: Linear vs Binary Search")
print("=" * 60)

for size in sizes:
    # Create sorted list
    data = list(range(0, size * 2, 2))  # [0, 2, 4, 6, ...]
    target = random.choice(data)  # Pick random target

    # Measure linear search
    linear_time = measure_search_time(
        lambda lst, t: find_file(lst, t),
        data,
        target,
        iterations=100
    )

    # Measure binary search
    binary_time = measure_search_time(
        lambda lst, t: binary_search_v3(lst, t),
        data,
        target,
        iterations=100
    )

    speedup = linear_time / binary_time

    print(f"\nList size: {size}")
    print(f"  Linear search:  {linear_time:.4f} ms")
    print(f"  Binary search:  {binary_time:.4f} ms")
    print(f"  Speedup:        {speedup:.1f}x faster")

Python Output (approximate, varies by machine):

Performance Comparison: Linear vs Binary Search
============================================================

List size: 10
  Linear search:  0.0012 ms
  Binary search:  0.0015 ms
  Speedup:        0.8x faster

List size: 100
  Linear search:  0.0089 ms
  Binary search:  0.0021 ms
  Speedup:        4.2x faster

List size: 1000
  Linear search:  0.0847 ms
  Binary search:  0.0028 ms
  Speedup:        30.3x faster

List size: 10000
  Linear search:  0.8234 ms
  Binary search:  0.0035 ms
  Speedup:        235.3x faster

Key observations: - For small lists (n=10), binary search is actually SLOWER due to overhead - At n=100, binary search starts to win - At n=1000+, binary search is dramatically faster - Speedup grows as list size increases (logarithmic vs linear)

The Complete Journey: Summary and Synthesis

The Journey: From Problem to Solution

Iteration	Problem	Technique Applied	Result	Complexity
Linear Search
0	Find single item	Basic recursion	Works for first match	O(n) time, O(n) space
1	Find all occurrences	Naive collection	Wrong indices	Still O(n)
2	Fix indices	Accumulator parameter	Correct indices	O(n) time, O(n) space
3	Handle nested lists	Type checking + structural recursion	Works on nested data	O(n) time, O(d) space
Binary Search
1	Search sorted list	Divide-and-conquer	Wrong indices	O(log n) time
2	Fix indices	Offset tracking	Correct indices	O(log n) time, O(log n) space
3	Optimize memory	Index-based (no slicing)	Production-ready	O(log n) time, O(log n) space

Final Implementations

Linear Search: Finding All Occurrences

def find_all_occurrences(items, target, current_index=0):
    """
    Find all indices where target appears in a flat list.

    Time: O(n), Space: O(n)
    """
    if len(items) == 0:
        return []

    if items[0] == target:
        rest_results = find_all_occurrences(items[1:], target, current_index + 1)
        return [current_index] + rest_results
    else:
        return find_all_occurrences(items[1:], target, current_index + 1)

Nested Search: Finding in Hierarchical Structures

def find_in_nested(structure, target):
    """
    Find all paths to target in nested list structure.

    Time: O(n), Space: O(d + k)
    where n = total elements, d = max depth, k = matches
    """
    results = []

    for i, element in enumerate(structure):
        if isinstance(element, list):
            nested_results = find_in_nested(element, target)
            for path in nested_results:
                results.append([i] + path)
        else:
            if element == target:
                results.append([i])

    return results

Binary Search: Efficient Search in Sorted Lists

def binary_search(sorted_list, target, left=0, right=None):
    """
    Binary search using indices (no slicing).

    Time: O(log n), Space: O(log n)
    """
    if right is None:
        right = len(sorted_list) - 1

    if left > right:
        return -1

    mid = (left + right) // 2
    mid_value = sorted_list[mid]

    if mid_value == target:
        return mid
    elif target < mid_value:
        return binary_search(sorted_list, target, left, mid - 1)
    else:
        return binary_search(sorted_list, target, mid + 1, right)

Decision Framework: Choosing Your Search Strategy

Question 1: Is your data sorted? - YES → Use binary search (O(log n)) - NO → Continue to Question 2

Question 2: Is your data nested? - YES → Use nested search with type checking - NO → Continue to Question 3

Question 3: Do you need all occurrences? - YES → Use find_all with accumulator - NO → Use basic linear search

Question 4: How large is your dataset? - Small (n < 20) → Linear search is fine, simpler code - Medium (20 < n < 1000) → Consider binary if sorted - Large (n > 1000) → Binary search essential if sorted

Question 5: How many searches will you perform? - One-time → Linear might be simpler - Many searches → Sort once, then use binary search

Complexity Summary

Algorithm	Time	Space (Call Stack)	Best For
Linear search (single)	O(n)	O(n)	Unsorted, small lists
Linear search (all)	O(n)	O(n)	Finding all matches
Nested search	O(n)	O(d)	Hierarchical data
Binary search (slicing)	O(log n)	O(log n) + O(log n)	Sorted lists, learning
Binary search (indices)	O(log n)	O(log n)	Sorted lists, production

Where: - n = number of elements - d = maximum nesting depth - k = number of matches

Lessons Learned

1. Index tracking is crucial in recursive list processing - Always account for position in original list - Use accumulator parameters or adjust results - Verify indices with actual list access

2. Type checking enables structural recursion - Use isinstance() to handle mixed types - Recurse into nested structures - Build paths to preserve structure information

3. Binary search requires careful index management - Slicing loses position information - Index-based approach is more efficient - Always verify base case (left > right)

4. Choose the right tool for the job - Linear search: simple, works on any list - Binary search: fast, requires sorted data - Nested search: handles hierarchical structures

5. Recursion shines for hierarchical data - Natural fit for nested structures - Clearer than iterative approaches with stacks - Preserves structure in results (paths)

Practice Problems

Try implementing these variations:

Find first occurrence after index k
Modify linear search to start at position k
Useful for "find next" operations
Binary search for insertion point
Return where target SHOULD be if not found
Used in sorted list insertion
Find all in range
Find all elements between min and max values
Combine binary search with linear scan
Nested search with depth limit
Only search up to depth d
Useful for limiting recursion depth
Case-insensitive search
Modify to ignore case when comparing strings
Handle both flat and nested lists

Next Steps

You've now mastered list searching with recursion! You've learned:

✓ Linear search with the "first + rest" pattern ✓ Finding all occurrences with accumulator parameters ✓ Searching nested structures with type checking ✓ Binary search with divide-and-conquer ✓ Index management in recursive functions ✓ Performance analysis and algorithm selection

Coming up in Module 3: We'll dive deeper into the divide-and-conquer paradigm, exploring how splitting problems in half leads to efficient algorithms like merge sort and quick sort. Binary search was just the beginning!